Question Answering with SEMEX at TREC 2005
نویسنده
چکیده
We describe the SEMEX question-answering system and report its performance in the TREC 2005 Question Answering track. Since this was SEMEX's first year participating in the TREC evaluations, implementation teething pains were expected and indeed encountered. Nevertheless, performance against difficult factoid and list questions was supportive of the question answering approach that was implemented. 1 System Description Our SEMEX (SEMantic EXtractor) tool is a test bed environment for evaluating and refining semantic extraction and question answering algorithms. SEMEX provides the graphical user interface shown in Figure 1 for viewing the intermediate results at key stages of the knowledge extraction process. As shown in the figure, the document being processed appears in the horizontal text area at the top. The six vertically-oriented text areas below it display the intermediate results after the following key stages of the semantic extraction process: 1. Part of speech tagging 2. Partial parsing 3. Chunking 4. Sentence splitting 5. Resolution 6. Concept extraction For the tagging component, SEMEX uses the Brill tagger [2], whose output is corrected for common tagger errors. Parsing is performed using Abney's Cass partial parser. SEMEX then applies a comprehensive set of empirically derived heuristics to build up phrases at the chunking stage. The resultant parse trees are then simplified and reduced to atomic propositions in the sentence splitting stage. Syntactic roles are assigned to the propositions and pronomial references are then resolved. And finally, concepts are extracted for each discourse entity identified in the resolved propositions. SEMEX is presently configured to implement concept nodes that link to the resolved propositions in which they appear. The resolved propositions are represented as vectors whose components correspond to the key syntactic roles: < subject, verb, gerund/infinitive, adverbials, indirect_object, direct_object > The concept nodes are further organized into a hierarchy of “is-a” relationships that are derived from both the proposition set and the concept phrases themselves. Thus, a proposition for “space shuttle Discovery” will have a parent link to “space shuttle” which, in turn, will link to “shuttle.” SEMEX provides text fields at the top of the GUI for entering the target and question, as shown in Figure 1. The figure also shows components for question number, year, run tag, and all question selection, which are important for TREC batch mode execution, but are not used for ad hoc processing. As presently configured, SEMEX processes a question first by resolving any pronouns using the target, and then by tagging and parsing the question to produce a question vector or boolean combination of vectors of the same form shown above for propositions, except that the expected answer is replaced with a variable. The text field below the question shows the question vectors when the “analyze” button is pressed. The same field displays the answer when the “answer” button is pressed. To produce an answer, SEMEX performs a unification of the question vector or vectors with the relevant vectors retrieved from the concept hierarchy for the particular question. WordNet [3] was used in the unification process to improve recall.
منابع مشابه
Syntax-based Concept Extraction for Question Answering Using SEMEX
The SEMEX tool for question answering is presented. Its architecture and features for extracting from input text a network of concept nodes that index syntax-based logical forms, are described. Methods are shown for decomposing questions into boolean combinations of question patterns and for using the concept network and logical forms together with WordNet for question answering. SEMEX's encour...
متن کاملQACTIS-based Question Answering at TREC 2005
The QACTIS system is being developed for the eventual purpose of providing a user the capability of multilingual question-answering from multimedia. QACTIS was tested at TREC-2005 as a means of identifying its successes and limitations in answering questions specifically from English newswire text as it moves in the direction of multilingual , multimedia question answering. In this paper, we pr...
متن کاملQuestion Answering with QED at TREC 2005
This report describes the system developed by the University of Edinburgh and the University of Sydney for the TREC-2005 question answering evaluation exercise. The backbone of our question-answering platform is QED, a linguistically-principled QA system. We experimented with external sources of knowledge, such as Google and Wikipedia, to enhance the performance of QED, especially for reranking...
متن کاملQuALiM at TREC 2005: Web-Question Answering with FrameNet
In this paper I describe my TREC 2005 participation. The system used was–except from one new module–the same as in TREC 2004. In the following I will describe this new module, which uses the annotated Natural Language data collected in the FrameNet project in order to find paraphrases to answer questions. I will furthermore present and discuss the TREC 2005 results and compare them to those ach...
متن کاملQuestion Answering Using the DLT System at TREC 2005
Factoids were the first type of question to appear at TREC and they are still the most frequent, with 362 appearing in the current test collection. Each asks for a single piece of information such as a name or a date. The key to our strategy in processing factoids (in common with most other participants) is to predict the expected type of answer (e.g. a date) from the type of question (e.g. ‘Wh...
متن کامل